A model for mutation in bacterial populations 
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We describe the evolution of E.coli populations through 
a Bak-Sneppen type model which incorporates random muta- 
tions. We show that, for a value of the mutation level which 
coincides with the one estimated from experiments, this model 
reproduces the measures of mean fitness relative to that of a 
common ancestor, performed for over 10,000 bacterial gener- 
ations. 

PACS numbers: 05.65.+b, 87.23.-n, 89.75.Fb 

The last decade has seen a renewed interest in the 
study of biological evolution. Besides the painstaking 
work of analyzing fossil records, spanning 10 8-9 years, 
there are now experiments performed by Lenski and co- 
workers with E.coli, which already comprise tens of thou- 
sands of generations [jjj . This has opened an entire "ex- 
perimental evolution" area, and their data are extremely 
useful to study the long-term evolutionary dynamics of 
populations. 

We first briefly review the essentials of Lenski's ex- 
periment. It considered 12 initially identical populations, 
each of them founded by a single cell from an asexual 
clone, propagating in similar environments during 1500 
days, in the following manner. At the beginning of each 
24 hour period, an initial batch of around 5 x 10 6 bacteria 
is placed in a growth medium, and, 24 hours later, when 
the population has increased by a factor of about 100, 
which implies log 2 100 ks 6.6 generations, the process is 
repeated by starting a new batch with 1% of the bacteria 
present. 

The mean cell volume and mean fitness of bacterial 
populations relative to the ancestor (RF) were measured 
every 100 generations. The (RF) of these populations, 
in its g— th generation is measured by placing a sample 
of each of them in contact with its original ancestor (un- 
freezing a sample taken at time 0, the first generation), 
and measuring the ratio of their rates of increase. In 
all the experiments the (RF) shows a rapid increase for 
rs 2000 generations after its introduction into the exper- 
imental environment, and then becomes practically con- 
stant. The average asymptotic value of the relative fitness 
is (RF)^ ~ 1.48 (the bar denotes the average over the 
12 populations). This behavior may be parameterized 
by an hyperbolic fit, / = (A + Bg)/(C + Dg), where / 
represents the relative fitness of the g-th generation and 
A, B, C, D are constants. 

On the theoretical side, several approaches to the 
evolutions of species in interactive systems have been pro- 



posed: the well known models of Kauffman and collab- 
orators for co-evolving species operating at the edge of 
criticality ||, models inspired on them ||, 0], and the 
Bak-Sneppen (BS) model S, among others. 

An essential ingredient of evolution theory, comple- 
menting the natural selection mechanism, is the existence 
of spontaneous mutations which produce hereditary dif- 
ferences among organisms. Such an ingredient is not ex- 
plicitly considered in the standard BS model, but it is 
clearly an essential mechanism in the evolution of bacte- 
rial cultures. 

In this work we modify the BS model in order to 
include random mutations so as to explain the E.coli 
results. The reasons for constructing such a model are 
twofold. First, it was experimentally found that the 12 
E.coli replicate populations approached distinct fitness 
peaks (different asymptotic (RF) into a band from ea 1.4 
to w 1.6) Q, supporting the multi-peaked fitness land- 
scape of the kind assumed in BS-type models. Second, 
as the initial populations were identical, and the environ- 
ment for bacterial growth was kept constant, the evolu- 
tion of these quantities resulted solely from spontaneous 
random genetic changes and competition between the dif- 
ferent cell varieties resulting from those changes. 

The model thus assumes two kinds of changes, one 
arising from the disappearance of the less fit strains and 
another, completely random, that may be attributed it 
e.g. to errors in the replication mechanism. All these 
changes are associated, in the model, to real mutations in 
the genome. In the case of the fitness driven changes, such 
mutations appear as a two-step process: the extinction of 
a bacterial strain, and its substitution by another, as in 
the original BS model. The new random mutations are 
associated to changes in the genoma unrelated to any 
selection process. 

Some clarifying remarks concerning the proper inter- 
pretation of the model in the context of E.coli experiment 
are necessary: 

i) Despite the fact that the original BS model considered 
evolution in a coarse grained sense (an entire species was 
represented by a single fitness parameter), here we will 
describe a system of evolving bacterial populations rather 
than whole species. 

ii) The focus of the BS model was to study the dy- 
namic equilibrium properties of an ecology, i.e. its Self- 
Organized Critical (SOC) behavior after the initial tran- 
sient. The data of H consider explicitly the transient 
evolution starting from the first bacterial generations. In 
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particular, changes are observed to be larger for the first 
2000 generations, and then gradually taper off. There- 
fore, we consider the evolution of the system from its 
very initial state, and not after it has equilibrated. 

Hi) In our model we consider a number of cellular 
automata N, which, for practical purposes, must take 
values much smaller than the number of E.coli in Lenski's 
experiment (5 x 10 6 — 5 x 10 8 ). Below we show that 
the model has scale invariance properties that justify this 
assumption. 

As in the case of BS, the model consists of a one- 
dimensional cellular automata of size N, with cells la- 
belled by a subindex i = 1, N. Therefore each of these 
TV cells represents a group of bacteria, and not single 
individuals. In other words, the system in the model 
is a coarse-grained representation of the cell population. 
Each vector cell is characterized by a real number be- 
tween and 1, Bi. This parameter may be interpreted 
as measuring the fitness of the "species" 1 i.e. a barrier 
against changes. 

In order to emulate the experimental condition that 
each of the 12 populations was initiated by a single cell 
from an asexual clone, we start with the same barrier for 
all cells, Bi = 0.5. The dynamics of the model consists 
in performing the some operations at steps corresponding 
to the time needed for an average E.coli to divide, i. e. a 
generation. Those operations are the following: at each 
step, corresponding to the doubling time of the bacteria: 
a) eliminate the cell with the lowest fitness b) eliminate, 
on the average, Q other cells c) replace the barriers asso- 
ciated to the cells eliminated by random numbers gener- 
ated with a uniform distribution in the interval [0,1], as 
in the BS model 

Results do not depend on the choice of distribution 
employed in operation c). Operations a) and b) are as- 
sociated to either fitness driven or random mutations, 
respectively. 

We should remark that in order to simulate the ex- 
periment, one would need to double the total number 
of cells at the end of each step, assigning copies of the 
barriers associated to the existing cells to the ones cre- 
ated. Furthermore, every G ~ 6 — 7 steps, the population 
should be reduced to its original value N by selecting at 
random this number of cells among the total population. 
However, since the mutation probability is found to be 
constant and inde pen dent of the size of the population, 
scaling properties allow us to avoid these population 
doublings and reductions, keeping the number of cells 
constant. Because of this simplification, the model be- 
comes formally equivalent to the mean field version of 
the BS model (MFBS) J§ However, here the Q barrier 
changes in c) are interpreted as random mutations, and 
not as changes to neighbors of the least fit specimen in 
the population as in the MFBS model. 

Since, as in the BS model, after a transient, the 
model fitness barrier distribution self-organizes in a step 
at B c = 1/(1 + Q), its asymptotic mean fitness is 



(B c + l)/2 = (Q + 2)/(Q + 1), while the mean fitness 
of the original uniform barrier distribution is 0.5. There- 
fore, in this model the asymptotic relative fitness to the 
ancestor is 



(flf)oo = 



Q 



Q + i 
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We select the value of Q so as to adjust (RF)^ ~ 
1.48. Hence, from ((TJ) we obtain Q ~ 1.1. It is interesting 
to note that the model suggests an approximately equal 
number of fitness driven and random mutations for the 
E.coli under the conditions of Lenski's experiment. 

While Q determines the asymptotic value of (RF)^, 
the number of cellular automata, N, is determined by 
the empirically observed mutation rate per replication u, 
which was estimated as fi « 0.002 per replication ||, ||. 
Since in the model we have on the average Q + 1 = 2.1 
changes per generation, then the number of cells in the 
simulation should be N ks 1000. In this way, the two 
parameters of the model, Q and N, are fixed so as to re- 
produce the experimentally observed asymptotic fitness 
and mutation rate found in the experiment. The simula- 
tions were performed for Q = 1.1 and for several values 
of N in the interval 500 < N < 2000. 

The agreement between the model and experiment 
is quite reasonable. In Fig. 1 we plot the (RF) trajectory 
every 500 generations for N = 1000 (□) and N = 1500 
(A) and 3 of the Lenski et al. best hyperbolic fits to 3 
(of the 12) sets of data for the E.coli experiments. 
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FIG. 1. Trajectories for mean fitness relative to the origi- 
nal ancestor during 10,000 generations. Averages from 1000 
numerical simulations for Q — 1.1 for N = 1000 (squares) and 
TV = 1500 (triangles) compared with the 3 best hyperbolic fits 
(lines) to data of 3 of the 12 corresponding experiments. 
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In order to analyze the initial rapid grow of (RF), 
in Fig. 2 we plot the experimental data and the model 
results for the first 2000 generations, every 100 gener- 
ations. Once again notice the overall agreement of the 
model with experimental E.coli data. 



.a 1.2 



V 




1000 
generation 

FIG. 2. Finer scale analysis of the trajectories of the mean 
fitness relative to the ancestor for the 2000 initial generations. 
Experimental data (filled circles), and model averages from 
1000 numerical simulations with Q = 1.1: N = 1000 (squares) 
N = 1500 (triangles). The standard deviations of the experi- 
mentally measured relative fitness are indicated as error bars 
in the data points. The model calculations have negligible 
dispersion in the scale of the plots presented in this work. 



It was suggested in that periods of stasis, charac- 
teristic of punctuated evolution, might be present in the 
data. Although the relatively large experimental error 
bars could make the data consistent with a monotonic 
increase like the one predicted by the model, we believe 
that further studies are needed to settle this issue. 

The results presented thus far assume that there is 
no neighbor relation between different strains of E.coli in 
the system. However, as in other ecological system in- 
terdependencies among species arise, one should explore 
the possibility that also in this case they exist. To do 
this, we think it is illustrative to present here also the 
results of a variation of the standard BS-type model. In 
this variation, as in the original model, at each time step 
the changes occur at the cell with lowest barrier and its 
two neighbors. In addition to this fitness driven form of 
evolution we include, with probability p per time step, a 
similar change in a randomly chosen cell. In this way the 
number of changes per time step is Q = 2 + p (the two 
neighbors of the cell with minimum barrier plus, with 
probability p, a cell at a random location). We denote 
this version of the model as BS+p. 
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The stationary properties of this BS+p model will 
be presented elsewhere [ p0[ . We have observed that the 
barrier distribution self-organizes into a step function at a 
position B c which decreases as the parameter p increases 
from B c ~ 0.667 for p — (the standard BS model) to 
B c ~ 0.22 for p = 1. Thus (RF)^ lies in the interval 
between 1.22 _(for p = 1) and 1.667 (for p = 0). We 
found that {RF)^ ~ 1.48 may be adjusted taking p ~ 
0.2. Hence, here the purely random mutations are taken 
with a weight proportional to 0.2, while those mutations 
related to natural selection are proportional to 3. 

In this BS+p version, in an analogous way as it hap- 
pens in the MFBS version, while p determines the asymp- 
totic value of (RF)^, the number of cellular automata, 
N, is determined to fit with the estimated fi ~ 0.002. 
Since, we now have, on the average, 3.2 changes per time 
step, in order to get roughly the same mutation rates 
in the simulation and in the experiment we should take 
N ~ 1500. In Fig. 3 we plot the (RF) trajectory every 
500 generations for N = 1000 (□) and N = 1500 (A) 
and 3 different hyperbolic fits of 0. Notice the good 
agreement with the Lenski et al. hyperbolic fit to data 
corresponding to the "A-l" experiment 0. 
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FIG. 3. Trajectories for mean fitness relative to the original 
ancestor during 10,000 generations, in the BS+p model. Aver- 
ages from 1000 numerical simulations for p = 0.2 for N = 1500 
(squares) compared with the data (filled circles) and their best 
hyperbolic fit (line). See text for further details. 



In the case of the E.coli, the MFBS model appears, 
in principle, more reasonable, as it does not seem plausi- 
ble to have interdependencies among closely related bac- 
terial strains. The BS+p model could be applicable to 
other situations, where different microorganisms coexist 
and are interdependent. Our point here was to show that 
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it is not possible to distinguish between the two models 
based solely on the measurements of the evolution of the 
fitness. 

To conclude, when considered during the transient 
from the initial ordered distribution, BS models with ran- 
dom mutations were shown to qualitatively reproduce the 
experimental results of Lenski and co-workers. The inclu- 
sion of random mutations, besides making more realis- 
tic the models, is required to get quantitative agreement 
with the experimental results, both for the transient and 
the asymptotic regime. While both fitness driven and 
random mutations were shown to be needed, their rela- 
tive importance remains an open question. One should 
remark that the calculations presented here are just an 
starting point in the exploration of this complex biologi- 
cal system. In particular, the existence of stasis regions, 
suggested by the data, remains another open issue. 
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